Analysis of spontaneous Japanese in a multi-language telephone-speech corpus
نویسندگان
چکیده
Takayuki Arai , Natasha Warner and Steven Greenberg Department of Electrical and Electronics Engineering, Sophia University, 7–1 Kioi-cho, Chiyoda-ku, Tokyo, 102–8554 Japan Department of Linguistics, University of Arizona, PO Box 210028, Tucson, AZ 85721–0028, USA Silicon Speech, 46 Oxford Drive, Santa Venetia, CA 94903, USA; Centre for Applied Hearing Research, Technical University of Denmark, Kgs. Lyngby, DK-2800, Denmark
منابع مشابه
The OGI multi-language telephone speech corpus
The OGI Multi-language Telephone Speech Corpus is designed to support research on automatic language identi cation and multi-language speech recognition. The corpus consists of up to nine separate responses from each caller, ranging from single words to short topic-speci c descriptions to 60 seconds of unconstrained spontaneous speech. The utterances were spoken over commercial telephone lines ...
متن کاملSelection of Multi-Word Expressions from Web N-gram Corpus for Speech Recognition
This paper proposes a method for constructing a statistical language model with multi word expressions (MWEs) selected from Google Japanese Web N-gram. MWEs are concatenated words that consist of idiomatic expressions or long-length morpheme sequences used frequently. In this paper a method for selecting the effective MWEs that improve the language model based on co-occurrence probabilities of ...
متن کاملSpontaneous Speech Corpus of Japanese
Design issues of a spontaneous speech corpus is described. The corpus under compilation will contain 800-1000 hour spontaneously uttered Common Japanese speech and the morphologically annotated transcriptions. Also, segmental and intonation labeling will be provided for a subset of the corpus. The primary application domain of the corpus is speech recognition of spontaneous speech, but we plan ...
متن کاملAnalysis of Language Variation Using a Large-Scale Corpus of Spontaneous Speech
Large-scale corpus of spontaneous speech can be a powerful tool for the study of language variation. Moreover, given that the corpus is publicly available, corpus-based analysis could open up the possibility of follow-up analysis in this area of linguistic study. Generally speaking, follow-up study is highly desirable in sciences but so far it has been virtually impossible in the area of socio-...
متن کاملAutomatic Estimation of Speaking Rate in Multilingual Spontaneous Speech
An automatic estimation of speaking rate is developed in this paper. It is based on an unsupervised vowel detection algorithm and thus may be costlessly applied to any language. Validation is driven on a spontaneous speech subset of the OGI Multilingual Telephone Speech Corpus. The correlation coefficient between the estimated and real speaking rates (evaluated in term of vowel-per-second rates...
متن کامل